library(tidyverse)
library(tigris)
library(sf)
library(mapview)
library(tidytransit)
library(osrm)

Sys.setenv(CENSUS_KEY="b88495f8315f45b2d100dee9ba8f4c489a7371c2")

options(
  tigris_class = "sf",
  tigris_use_cache = TRUE
)

projection <- "+proj=utm +zone=10 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=ft +no_defs"

# SET your path here to the P drive.
path <- 'P:/SFBI/Restricted Data Library/'

Each transit provider publishes GTFS files. SamTrans files can be found at https://www.samtrans.com/Developer.html. Best to load this directly from the website to always stay as updated as possible.

gtfs <- read_gtfs("https://www.samtrans.com/Assets/GTFS/samtrans/ST-GTFS.zip")
summary(gtfs)
## GTFS object
## files        agency, calendar, routes, stop_times, stops, trips, calendar_dates, fare_attributes, fare_rules, shapes
## agency       SamTrans
## service      from 2020-04-26 to 2020-06-20
## uses         stop_times (no frequencies)
## # routes       30
## # trips      2997
## # stop_ids   1416
## # stop_names 1067
## # shapes       74

GTFS files have a standard structure and the functions within tidytransit are specifically designed to process them. Fuller introduction can be found at https://cran.r-project.org/web/packages/tidytransit/vignettes/introduction.html.

Note in the summary that the time period is specific, so all stop details will be based on this time period. We separately have ridership (APC) data from 2019. It’s possible that stop and route geometries have changed throughout this time period. With the APC we could reconstruct past routes if necessary, but this is only necessary if we encounter stop IDs that don’t exist in the most recent GTFS, or other major differences.

Here’s how to get stops and quickly map them.

stops <- 
  st_as_sf(gtfs$stops,coords=c(6,5)) %>% 
  st_set_crs(4326) %>% 
  st_transform(projection)

head(stops)
## Simple feature collection with 6 features and 8 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: 1785951 ymin: 13648190 xmax: 1792014 ymax: 13664300
## CRS:            +proj=utm +zone=10 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=ft +no_defs
## # A tibble: 6 x 9
##   stop_id stop_code stop_name stop_desc zone_id stop_url location_type
##   <chr>   <chr>     <chr>     <chr>     <chr>   <chr>            <int>
## 1 311010  311010    Terra No~ <NA>      1       <NA>                 0
## 2 311013  311013    Bradford~ <NA>      1       <NA>                 0
## 3 311019  311019    Clarendo~ <NA>      1       <NA>                 0
## 4 311020  311020    Clarendo~ <NA>      1       <NA>                 0
## 5 311025  311025    Crespi D~ <NA>      1       <NA>                 0
## 6 311026  311026    Crespi D~ <NA>      1       <NA>                 0
## # ... with 2 more variables: parent_station <chr>, geometry <POINT [ft]>
mapview(stops)

These stop locations may be all that’s necessary for our primary analysis, since we are mainly interested in the relationship between population locations and bus stops. But I’ll demonstrate a few other functions from tidytransit that I’ve used before.

Say you wanted to filter to stop times in a specific time window. Note how you need to feed in hours as 1 to 24, and then convert to seconds.

start_hour <- 16
end_hour <- 20

stop_times <- filter_stop_times(gtfs, "2020-06-20", start_hour*3600, end_hour*3600)

head(stop_times)
##                            trip_id arrival_time departure_time stop_id
## 1: 10754177-132-Blocks-Saturday-50     16:18:00       16:18:00  336623
## 2: 10754177-132-Blocks-Saturday-50     16:19:00       16:19:00  336036
## 3: 10754177-132-Blocks-Saturday-50     16:27:00       16:27:00  335637
## 4: 10754177-132-Blocks-Saturday-50     16:28:00       16:28:00  335503
## 5: 10754177-132-Blocks-Saturday-50     16:29:00       16:29:00  335162
## 6: 10754177-132-Blocks-Saturday-50     16:29:00       16:29:00  335620
##    stop_sequence pickup_type drop_off_type timepoint arrival_time_num
## 1:             1           0             0         1            58680
## 2:             2           0             0         0            58740
## 3:             3           0             0         1            59220
## 4:             4           0             0         0            59280
## 5:             5           0             0         0            59340
## 6:             6           0             0         1            59340
##    departure_time_num
## 1:              58680
## 2:              58740
## 3:              59220
## 4:              59280
## 5:              59340
## 6:              59340

The resultant dataframe is similar in structure to the APC data. The stop IDs should match between GTFS and APC. It is not yet clear if any other IDs match.

Here’s how to get stop frequencies: the number of times each stop is visited in the given time window. There is a resultant field called departures. You may want to use this to identify “high frequency” stops (e.g. 3 or more departures per hour).

stop_frequency <-
  get_stop_frequency(
    gtfs,
    start_hour,
    end_hour
  )

head(stop_frequency)
## # A tibble: 6 x 6
##   route_id direction_id stop_id service_id            departures headway
##   <chr>           <int> <chr>   <chr>                      <int>   <int>
## 1 110-180             0 311074  132-Blocks-Weekday-50          2     120
## 2 110-180             0 311074  132-Blocks-Weekday-51          2     120
## 3 110-180             0 311078  132-Blocks-Weekday-50          2     120
## 4 110-180             0 311078  132-Blocks-Weekday-51          2     120
## 5 110-180             0 311081  132-Blocks-Weekday-50          2     120
## 6 110-180             0 311081  132-Blocks-Weekday-51          2     120

The most valuable function in tidytransit is raptor(), which creates an origin-destination style matrix with the earliest arrival times for all destination stops from a chosen origin stop within a time window. The parameters required are pretty involved, so I’d recommend you read the documentation carefully online: http://tidytransit.r-transit.org/reference/raptor.html. Below I manually select the stops in RWC’s transit station as the departure points.

rwc_stops <- c("344631","344633","344636","344637")

rptr <- 
  raptor(
    stop_times, 
    gtfs$transfers, 
    rwc_stops, 
    departure_time_range = (end_hour-start_hour)*3600,
    keep = "shortest"
  )

rptr[1:10,]
##     stop_id travel_time journey_departure_stop_id journey_departure_time
##  1:  344633           0                    344633                  57840
##  2:  344636           0                    344636                  58200
##  3:  344637           0                    344637                  58320
##  4:  344631           0                    344631                  59400
##  5:  344095          60                    344636                  58440
##  6:  344084          60                    344633                  66900
##  7:  344070         120                    344636                  58440
##  8:  344086         120                    344633                  70200
##  9:  344083         180                    344636                  58440
## 10:  344080         180                    344636                  64380
##     min_arrival_time transfers
##  1:            57840         0
##  2:            58200         0
##  3:            58320         0
##  4:            59400         0
##  5:            58500         0
##  6:            66960         0
##  7:            58560         0
##  8:            70320         0
##  9:            58620         0
## 10:            64560         0

The first few trips are 0 seconds because they are to the same stops as you started with.

The following maps the stops that can be reached from RWC in the given time window, colored by the travel time in minutes.

mapview(
  rptr %>% 
    left_join(stops, by = "stop_id") %>% 
    st_as_sf() %>% 
    mutate(minutes = travel_time/60),
  zcol = "minutes",
  layer.name = "Travel times<br>from RWC"
)

Let’s say you wanted to figure out which POIs were accessible from RWC station by bus + walking within 10 minutes. We could filter to the stops within a travel time of 10 minutes, keeping track of whatever time is left over, and then create walking isochrones from those stops, then grab all POIs within those zones. I’ll demonstrate this first with “as the crow flies” circles from the stops, and then show how to refine with OSRM network routing.

max_time_minutes <- 10

accessible_stops <-
  rptr %>% 
  mutate(
    travel_time_minutes = travel_time/60,
    walking_time_minutes = max_time_minutes - travel_time_minutes
  ) %>% 
  filter(travel_time_minutes <= max_time_minutes) %>% 
  left_join(stops, by = "stop_id") %>% 
  st_as_sf()

accessible_zone <-
  1:nrow(accessible_stops) %>% 
  map_dfr(function(row){
    accessible_stops[row,] %>% 
      st_buffer(accessible_stops$walking_time_minutes[row]/20*5280) %>% # convert minutes to ft assuming 20 minutes per mile walking
      as.data.frame()
  }) %>% 
  st_as_sf() %>% 
  st_set_crs(projection)

mapview(accessible_zone)

Get Safegraph POIs and retain only the ones within this zone:

poi_ca <- readRDS(paste0(path,'Safegraph/covid19analysis/ca_poi.rds'))

smc <- counties("CA",cb=F,progress_bar=F) %>% 
  filter(NAME == "San Mateo") %>% 
  st_transform(4326)

accessible_POIS <-
  poi_ca %>% 
  st_as_sf(coords = c("longitude","latitude")) %>% 
  st_set_crs(4326) %>% 
  .[smc,] %>% 
  st_transform(projection) %>% 
  .[accessible_zone,]

mapview(accessible_POIS)

The same approach could be adapted to start from CBGs, blocks, or parcels and figure out accessibility to any destination. You’d need to first figure out walking distance to the nearest stops and then run raptor() from those stops, accounting for the possibility that it’s not necessarily the closest stop within walking distance that gets you to some destination the fastest.

If you want to switch from “as the cross flies” to a more realistic “network” route, then we need to use the osrm package.